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Abstract 

O 

We study the problem of distributed traffic control in the partitioned plane, 
where the movement of all entities (robots, vehicles, etc.) within each parti- 
tion (cell) is coupled. Establishing liveness in such systems is challenging, but 
such analysis will be necessary to apply such distributed traffic control algo- 

I— I rithms in applications like coordinating robot swarms and the intelligent high- 

way system. We present a formal model of a distributed traffic control protocol 
that guarantees minimum separation between entities, even as some cells fail. 
^ Once new failures cease occurring, in the case of a single target, the protocol is 

Q guaranteed to self -stabilize and the entities with feasible paths to the target cell 

make progress towards it. For multiple targets, failures may cause deadlocks 

^vq in the system, so we identify a class of non-deadlocking failures where all en- 

^ titles are able to make progress to their respective targets. The algorithm relies 

OO on two general principles: temporary blocking for maintenance of safety and 

Jlf^ local geographical routing for guaranteeing progress. Our assertional proofs 

may serve as a template for the analysis of other distributed traffic control pro- 
tocols. We present simulation results that provide estimates of throughput as 
a function of entity velocity, safety separation, single-target path complexity, 
failure-recovery rates, and multi-target path complexity. 

1 Keywords: distributed systems, swarm robotics, formal methods 
> 
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^ 1. Introduction 

Highway and air traffic flows are nonlinear switched d3mamical systems 
that give rise to complex phenomena such as abrupt phase transitions from fast 
to sluggish flow [1, 2, 3]. Our ability to monitor, predict, and avoid such phe- 
nomena can have a significant impact on the reliability and capacity of physical 
traffic networks. Traditional traffic protocols, such as those implemented for air 
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traffic control are centralized [4] — a coordinator periodically collects information 
from the vehicles, decides and disseminates waypoints, and subsequently the 
vehicles try to blindly follow a path to the waypoint. Wireless vehicular net- 
works [5, 6, 7, 8] and autonomous vehicles [9, 10] present new opportunities 
for distributed traffic monitoring [11, 12, 13] and control [14, 15, 16, 17, 18, 19]. 
While these protocols may still rely on some centralized coordination, they 
should scale and be less vulnerable to failures compared to their centralized 
counterparts. In this paper, we propose a fault-tolerant distributed traffic con- 
trol protocol, formally model it, and formally prove its correctness. 

A traffic control protocol is a set of rules that determines the routing and 
movement of certain physical entities, such as vehicles, robots, or packages, 
over an underlying graph, such as a road network, air-traffic network, or ware- 
house conveyor system. Any traffic control protocol should guarantee: (a) {safety) 
that the entities always maintain some minimum physical separation, and (b) {progress) 
that the entities eventually arrive at a given a destination (or target) vertex. 
In a distributed traffic control protocol, each entity determines its own next- 
waypoint, or each vertex in the underlying graph determines the next-waypornts 
for the entities in an appropriately defined neighborhood. 

In this paper, we study the problem of distributed traffic control in a parti- 
tioned plane where the motions of entities within a partition are coupled. The 
problem can be described as follows (refer to Figures 1 and 2). The environ- 
ment — the geographical space of interest — is partitioned into regions or cells. 
Each entity is assigned a certain type or color. For each color, there is one source 
cell and one target cell of the same color. The source cells produce entities of 
some color, and the target cells only consume entities of a particular color, so 
the goal is to move entities of color c to the target of color c. The motion of all 
entities within a cell are coupled, in the sense that they all either move identi- 
cally, or they all remain stationary (we discuss the motivation for this below). 
If some entities within some cell i touch the boundary of a neighboring cell j, 
those entities are transferred to j. Thus, the role of the distributed traffic con- 
trol protocol is to control the motion of the cells so that the entities (a) always 
have the required safe separation, and (b) reach their respective targets, when 
feasible. 

The coupling mentioned above that requires entities within a cell to move 
identically may appear strong at first sight. After all, under low traffic condi- 
tions, individual drivers control the movement of their cars within a particular 
region of the highway, somewhat independently of the other drivers in that re- 
gion. However, on highways under high-traffic, high-velocity conditions, it is 
known that coupling may emerge spontaneously, causing the vehicles to form 
a fixed lattice structure and move with near-zero relative speed [1, 20]. In other 
scenarios, coupling arises because passive entities are moved around by active 
cells. For example, this occurs with packages being routed on a grid of multi- 
directional conveyors [21, 22], and molecules moving on a medium according 
to some controlled chemical gradient. Finally, even where the entities are ac- 
tive and cells are not, the entities can cooperate to emulate a virtual active cell 
expressly for the purposes of distributed coordination. This idea has been ex- 
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plored for mobile robot coordination in [23] using a cooperation strategy called 
virtual stationary automata [24, 25]. 

In this paper, we present a distributed traffic control protocol that guar- 
antees safety at all times, even when some cells fail permanently by crashing. 
The protocol also guarantees eventual progress of entities toward their targets, 
provided (a) that there exists a path through non-faulty cells to the entities' re- 
spective targets, and (b) failures have not introduced unrecoverable deadlocks. 
Specifically, the protocol is self-stabilizing [26, 27], in that once new failures stop 
occurring, the composed system automatically returns to a state from which 
progress can be made. The algorithm relies on the following four mechanisms. 

(a) There is a routing rule to maintain local routing tables to each target at each 
non-faulty cell. This routing protocol is self-stabilizing and allows our pro- 
tocol to tolerate crash failures of cells. 

(b) There is a mutual exclusion and scheduling mechanism to ensure moving 
entities over distinctly colored overlapping paths do not introduce dead- 
locks. The locking and scheduling mechanism ensures one-way traffic can 
make progress over shared routes (traffic intersections). 

(c) There is a signaling rule between neighbors that guarantees safety while 
preventing deadlocks. Roughly speaking, the signaling mechanism at some 
cell fairly chooses among its neighboring cells that contain entities, indicat- 
ing if it is safe for one of these cells to apply a movement in the direction of 
the cell doing the signaling. This permission-to-move policy turns out to 
be necessary, because movement of neighboring cells may otherwise result 
in a violation of safety in the signaling cell, if entity transfers occur 

(d) The movement policy causes all entities on a cell to either move with the 
same constant velocity in the direction of their destination, or remain sta- 
tionary to ensure safety. This policy abstracts more complex motion mod- 
eling. 

We establish these safety and progress properties through systematic asser- 
tional reasoning. For safety properties, we establish inductive invariants and 
for stabilization we use global ranking functions. To show that all entities reach 
their destinations (when feasible), we use a combination of ranking functions 
and fairness-based reasoning on infinite executions. These proof techniques 
may serve as a template for the analysis of other distributed traffic control 
protocols. Our analysis is generally independent of the size of the environ- 
ment, number of cells, and number of entities. Additionally, only neighbor- 
ing cells communicate with one another and the communication topology is 
fixed (aside from failures). For these reasons, this problem can serve as a case 
study in automatic parameterized verification of distributed cyber-physical 
systems [28, 29, 30, 31]. 

We present simulation results that illustrate the influence (or the lack thereof) 
of several factors on throughput, (a) Throughput decreases exponentially with 
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Figure 1: Source cells (1 and 11) pro- 
duce entities that flow toward the target 
cell (18 and 5) of the appropriate color. 
Source-to-target paths overlap at cells 8 
and 13. In this execution, the blue entity 
on cell 7 is waiting for the red entities to 
leave the overlapping cells. 



Figure 2: If cell 10 
moved its blue entities 
onto the shared one-lane 
"bridge" (11, 12, 13, 
14, 15), then all entities 
would be deadlocked. 



path length until saturation, as which point it decreases roughly linearly with 
path length, (b) Throughput decreases roughly linearly with required safety 
separation and cell velocity, (c) Throughput decreases roughly exponentially 
until it saturates as a function of path complexity measured in number of turns 
along a path, (d) Throughput decreases roughly exponentially with failure 
rate, and increases linearly with recovery rates, under a model where crash 
failures are not permanent and cells may recover from crashing, (e) Through- 
put decreases roughly exponentially until it saturates as a function of the per- 
centage of overlapping cells between different colored targets. 

Contributions over Previous Work. In previous work [32], we analyzed a similar 
problem, but have significantly generalized our results in this paper. 

(A) We consider general tessellations (including triangulations) that define the 
partitioning, while we considered uniform square partitions in [32]. We 
also present results on partitioning schemes that cannot work for our for- 
mulation of the problem. 

(B) We allow entities of multiple colors, each flowing to a different target, 
while in [32], we only allowed entities of one color, all of which flowed to 
the nearest target. This generalization lets source-to-target paths of differ- 
ent colors overlap, creating intersections, and requires several changes to 
the algorithm, including adding a mutual exclusion and scheduling mech- 
anisms used to control traffic intersections. This generalization is signif- 
icant because it makes the problem applicable to a much wider class of 
systems. 
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(C) We extended our simulation results to allow for these generalizations, and 
characterized the cost on throughput due to the extra coordination re- 
quired to allow multiple colors. 

Paper Organization. The rest of the paper is organized as follows. First, Sec- 
tion 2 introduces the model of the physical system. Next in Section 3, we 
present the distributed traffic control algorithm. Then in Section 4, we define 
and prove the safety and progress properties. Subsection 4.1 establishes safety. 
Subsequently, we establish a progress property that shows entities eventually 
reach their targets in spite of failures (when possible). First in Subsection 4.2, 
it is shown that the routing protocol to find any target from any cell with a 
physical path through non-faulty cells to that target is self-stabilizing. Then 
in Subsection 4.3, we show how overlapping paths to different targets (traffic 
intersections) can be scheduled. Finally, in Subsection 4.4, it is shown that en- 
tities on any cell with a feasible physical path to their target eventually reach 
their target. Simulation results and interpretation are presented in Section 5, 
followed by a brief discussion of related work and further extensions, and a 
conclusion in Sections 6, 7, and 8. 

2. Physical System Model 

We describe the physical system in this section. For a set K, we define 

Kj_ = K U {J.} and Koo ^ K U {oo}. For iV e N, let [N] = {1, . . . , iV}. The |H| 
brackets are used for the Euclidean norm of a vector 

Partitioning. The system consists of N convex polygonal cells partitioning a 
polygonal environment. Let ID = [N] be the set of unique identifiers for all 
cells in the system.The planar environment Env is some given simply con- 
nected polygon. A partition P of Env is a set of closed, convex polygonal cells 
{Pi}ieiD such that: 

(a) the interiors of the cells are pairwise disjoint, 

(b) the union of the cells is the original polygonal environment, and 

(c) cells only touch one another at a point or along an entire side. 

The first two conditions are the standard definition of a partition, while the 
third restricts any cell from being adjacent along one of its sides to more than 
one other cell. Thus, cell i occupies a convex polygon in the Euclidean 
plane. The boundary of cell i is denoted by dPi. We denote the vertices (ex- 
treme points) of Pi as Vi. We denote the number of sides of Pi as ns{i). Let 
Side{i,j) = dPi Ci dPj be the common side of adjacent cells i and j — we will 
refer to Side{i, j) as both an index and a line segment (set of points). 
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Communications. Cell i is said to be a neighbor of cell j if the boundaries of the 
cells share a common side. The set of identifiers of all neighbors of cell i is 
denoted by Nbrsi. This definition of neighbors can naturally be represented as 
a graph, so let A be the worst-case diameter of such a neighbor communication 
graph^. For each cell i E ID and each neighboring cell j G Nhrsi, let the 
side normal vector from i to j, denoted n{i,j), be the unit vector orthogonal 
to Side{i, j) and porntrng into cell j from the common side Side{i, j). 

Each cell is controlled by software that implements the distributed traffic 
control algorithm described in the next section. We consider synchronous pro- 
tocols that operate in rounds. At each round, each cell exchanges messages 
bearing state information with its neighbors. Then, each cell updates its soft- 
ware state and decides the (possibly zero) velocity with which to move any 
entities on it. Until the beginning of the next round, the cells continue to oper- 
ate according to this velocity, which may lead to entity transfers. 

Entities. Each cell may contain a number of entities. Each entity occupies a cir- 
cular area and represents a physical object (or overapproximation of) such as 
an aircraft, car, robot, or package. Every entity that may ever be in the system 
has a unique identifier drawn from an index set /. This assumption is for pre- 
sentation only, and the algorithm does not rely on knowing entity identifiers. 
For an entity p e I, we denote the coordinates of its center hyp= {px,Py) G . 

The open circular area (disc) centered at p of radius r representing entity p 
is denoted B{p,l). The radius of an entity is I, and is the minimum required 
inter-entity safety gap. We define the total safety spacing radius as rf = + /. 
For simplicity of presentation, we work with uniform entity radii I and safety 
gaps Vs. If they differ, we would take / and to be the maximums over all 
entities. We instantiate B{p, I), which represents the physical space occupied 
by entity p, and we also instantiate B {p, d), which is entity p's total safety area. 

Entity Colors, Source Cells, and Target Cells. There are |C| types (or colors) of 
entities, where C is some finite, ordered set. The color of some entity p G I 
is denoted as color (p). For each c e C, there is a source cell side and a target 
cell tide- All other cells are ordinary cells. For simplicity of presentation, we 
assume there is a unique source and target, but the algorithms and the results 
generalize for when side and tide are sets. 

Entity p's color color (p) designates the target cell entity p should eventually 
reach. The source side produces entities of color c and the target tide consumes 
entities of color c. The sets of target and source identifiers are denoted IDx Q 
ID and IDs ID, respectively. 

Entity Movement. All the entities within a cell move identically — either they 
remain stationary or they move with some constant velocity < u < / in the 



^The diameter of this graph is not static, it may change due to failures, but the worst case is 
always a path graph, so A = Af. 
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direction of one of the sides of the cell. Thus v is the maximum cell velocity, 
or the greatest distance traveled by any entity over one S5mchronous round. 
We require u > to ensure progress. We require that v < I io ensure entities 
do not collide when transfers occur. Cell velocity may differ in each cell so 
long as each is upper bounded by v. This movement is determined by the 
algorithm controlling each cell. When a moving entity touches a side of a cell, 
it is instantaneously transferred to the neighboring cell beyond that side, so 
that the entity is entirely contained in the new cell. 

Safety and Transfer Regions. The safety region on side s of a cell is the area 
within the cell where (the centers of) new entities entering the cell from side 
s can be placed. For a side s of some cell i, the safety region on side s SRi{s) is 
the area on Pi at most 'id distance measured orthogonally from side s. Analo- 
gously, the transfer region on side s of a cell is the area within a cell where (the 
centers of) entities reside when those entities will be transfered to the neigh- 
boring cell on that side. The transfer region on side s TRi{s) is the region in the 
partition Pi at most I distance measured orthogonally from side s. For a cell 
i, the transfer region TRi and safety region SRi are respectively the unions of 
TRi{s) and SRi{s) for each side s of Pi. We refer to the inner side(s) of TRi, 
TRi{s), SRi, or SRi{s), as the side(s) touching the inside of the annulus, and 
denote them by ITR,, ITR,{s), etc. 

For example, in Figure 3, the transfer region for the square cell 3 is the 
square annulus between the smaller cyan square and the larger blue square 
(the boimdary dP-^ of cell 3). Similarly, for the triangular cell 1 in Figure 3, the 
transfer region is the triangular annulus between the smaller cyan triangle and 
the larger blue triangle. Thus, the distance measured orthogonally between the 
sides of the cyan polygons representing the boundary of the transfer region, 
and the sides of the blue polygons is always I. In Figure 3, for the square cell 3, 
the safety region is the square annulus between the smaller red square and the 
larger blue square. 

Geometric Assumptions. We assume that the polygonal environment Env and 
its partition P have shapes and sizes such that each cell in the partition is large 
enough for an entity to lie completely on it. Particularly, we require for each cell 
i & ID that the transfer region TRi is nonempty. We also assume the following 
assumptions to ensure transferring entities between cells is well-defined. 

Assumption 1. (Projection Property): For each i e ID, for each side s of Pi, there 
exists a constant vector field over Pi that drives every point in Pi to some point on side 
s without exiting Pi. 

By definition, the cells form a partition. However, partially because there 
is "empty space" between the transfer regions of the cells, the transfer regions 
do not form a partition. Even if we remove this empty space by translating 
the transfer regions so the sides of transfer regions of neighboring cells coin- 
cide, they still may not form a partition (see Figure 3 for an example where the 
transfer regions cannot form a partition). This is because, for the shared side s 
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Figure 3: Safety regions (areas between red and 
blue) and transfer regions (areas between cyan 
and blue) for the squares and triangles compos- 
ing the snub square tiling tessellation. 



Figure 4: Blue and red paths overlap at 
cells 2, 3, and 4. Blue entities on cells 
7 and 8 have traversed the intersection 
and then the red source (4) produces 
entities. Red and blue sources pro- 
ducing entities simultaneously would 
cause a deadlock. 



of neighboring cells i and j, the inner sides of the transfer regions on Pi and Pj 
may have different lengths, even though the shared side s obviously had the 
same length for P, and Pj . 

Assumption 2. (Transfer Feasibility): For any i £ ID and any j e Nbrsi, con- 
sider their common side Side{i,j). The length of the inner side ITRi{Side{i,j)) line 
segment equals the length of the inner side ITRj{Side{i, j)) line segment. 

3. Distributed Traffic Control Algorithm 

Next, we describe the discrete transition system CeWt that specifies the soft- 
ware controlling an individual cell Pi of the partition P. 

Preliminaries. A variable is a name with an associate type. For a variable x, its 
type is denoted by type (x) and it is the set of values that x can take. A valuation 
(or state) for a set of variables X is denoted by x, and is a function that maps 
each X ^ X to a point in type{x). Given a valuation x for X, the valuation for a 
particular variable v E X, denoted by x.v, is the restriction of x to {v}. The set 
of all possible valuations of X is denoted by val{X). Many variables return cell 
identifiers that we use to access variables of other cells using subscripts, and 
if the valuation of these variables are restricted to the same state, we will drop 
the particular state on the subscripted variables for more concise notation. For 
instance, suppose x.neiti £ ID, then x.next^xi.nexti wouldhewrittenx.nextnexti- 
A discrete transition system ^ is a tuple {X, Qo, A, -^), where: 

(i) X is a set of variables and val{X) is called the set of states, 
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Figure 5: Example illustrating the computa- 
tion of the color-shared cells and shared colors, 
stored in the pint[c] and lcsi[c] variables, re- 
spectively. The color-shared cells are any cells 
on overlapping paths, and Icsi [c] corresponds 
to the colors of each disjoint set of color-shared 
cells. 



Figure 6: Example illustrating the two fairness 
requirements (Assumption 4) for proving live- 
ness. Cells 9, 14, 19, and 20 failed, causing the 
original source-target path for blue to change 
from cells 5, 10, 15, 20, 25. If source cell 5 does 
not place new entities fairly, then entities on 
cells 10 and 15 may never reach the target. A 
similar situation occurs with paths of multiple 
colors in the lower part of the image. 



(ii) Qq C val{X) is the set of start states, 

(iii) A is a set of transition names, and 

(iv) — >C val{X)xAxval{X)\saseioidiscretetransitions. For (xA;,a,Xfc+i) G— >, 
we also use the notation A x^+i. 

An execution fragment of a discrete transition system ^ is a (possibly infinite) 
sequence of states a = xo,xi, . . ., such that for each index appearing in a, 
(xfe, a, xfc-|_i) e— > for some a € A. An execution is an execution fragment with 

G Qo- A state X is said to be reachable if there exists a finite execution that 
ends in X. is said to be safe with respect to a set 5 C val{X) if all reachable 
states are contained in 5*. A set S is said to be stable if, for each (x, a, x') G— 
X e 5 implies that x' e S". is said to stabilize to 5 if 5 is stable and every 
execution fragment eventually enters 5*. 

Cells. We assume messages are delivered within bounded time and computa- 
tions are instantaneous. Under these assumptions, the system can be modeled 
as a collection of discrete transition systems. The overall system is obtained 
by composing the transition systems of the individual cells. We first present 
the discrete transition system corresponding to each cell, and then describe the 
composition. 

The variables associated with each Cell^ are as follows, with initial values of 
the variables shown in Figure 7 using the ':=' notation. 
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(a) Entitiesi is the set of identifiers for entities located on cell i. Cell i is said to 
be nonempty if Entitiesi ^ 0. 

(b) color i designates the entity colors on the cell, or _L if there are none^. 

(c) failed indicates whether or not i has failed. 

(d) NEPrevi are the nonempty neighbors attempting to move entities (of any 
color) toward cell i. 

(e) tokerii is a token used for fairness to indicate which neighbor may move 
toward i. 

(f) signal^ is the identifier of a neighbor of Celli that has permission to move 
toward Cellj. 

Additionally, the following variables are defined as arrays for each color c e C. 
The notation nexti[c\ means the c*'' entry of the next variable of cell i, and so 
on for the other variables. 

(a) nexti[c] is the neighbor towards which i attempts to move entities of color 
c. 

(b) disti [c] is the estimated distance — the number of cells — to the nearest target 
cell consuming entities of color c. 

(c) locki [c] is a boolean variable for a lock of color c that some cells require to 
be able to move entities. 

(d) path^[c] is the set of cell identifiers from any source of color c (and any 
nonempty cell with entities of color c) to the target of color c. This vari- 
able and the next two are local variables, but they are storing some global 
information. 

(e) pint^[c] is the set of cell identifiers in traffic intersections with cells of color 
c (where path.j\c] and path^[d] have nonempty intersection for some d^ c). 

(f) fcsi[c] is the set of colors that are involved in traffic intersections with the 
color c path. 

When clear from context, the subscripts in the names of the variables are dropped 
A state of Cell; refers to a valuation of all these variables, i.e., a function that 
maps each variable to a value of the corresponding t5^e. The complete system 
is an automaton, called System, consisting of the composition of all the cells. 
A state of System is a valuation of all the variables for all the cells. We refer to 
states of System with bold letters x, x', etc. 



^It will be established that cells contain entities of only a single color, see Invariant 3. 
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Entities : Set[P] :^ {} 




fail(i) 






3 


NEPrev: Set[/D_L] :^ {} 
signal, token : ID ± :— 




eft failed : 
for each 


— true 
e^C 


16 


5 


color : C_L :— J- 
failed : B :— false 




(iist [c 


:— oo; next[c] :— _L 


18 


7 


next : [C — ^ init Vc G C, next[c] :^ J_ 
disi : [C —J- N3c],initVc G C, dist[c] :^ oo 




update 

eff Route; 


Lock; Signal; Move 


20 


9 


pat/i : [C Setf/D^jj^init Vc G C, path[e] :^ 
pini : [C Set[/D^]], init Vc G C, pint[c] :^ 


{} 
{} 
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nlock : [C — ^ Bj^initVc G C, nloek :— true 
lock : [C -5- B], init Vc e C, lock :^ /a/se 










13 


Ics : [C ^ Set [C]], init Vc e C, lcs{c\ :^ {} 











Figure 7: Specification of Celli listing its variables, initial conditions, and transitions. Sub- 
scripts are dropped for readability. 



Variables tokerii, failed ^, locki, and NEPrevi are private to Cell^, while Entitiesi, 
disti, nexti, path^, colori, and signal.^ can be read by neighboring cells of Cell^. 
This has the following interpretation for an actual message-passing implemen- 
tation. At the beginning of each round. Cell; broadcasts messages containing 
the values of these variables and receives similar values from its neighbors. 
Then, the computation of this round updates the local variables for each cell 
based on the values collected from its neighbors. 

Variable Entitiesi is a special variable because it can also be written to by 
the neighbors of i. This is how we model transfer of entities between cells. For 
a state x, for some a e A such that x A x', for some i G ID, for some j e Nhrsi, 
for some entity p e ^x.. Entities i, then entity p transfers from cell i to j when 
p E x.' . Entities j . We use the notation p' to denote the state of entity p at x' 
where x A x' for some a E A. 

Actions for the Composed System. System is a discrete transition system model- 
ing the composition of all the cells, and has two types actions: fails and updates. 
A fail(i) transition models the crash failure of the i*'' cell and sets failed.^ to true, 
disti[c] to oo for each c € C, and nexti[c] to _L for each c e C. Cell i is called 
faulty if failed^ is true, otherwise it is called non-faulty. The set of identifiers 
of all faulty and non-faulty cells at a state x is denoted by -F(x) and NF{x.), 
respectively. A faulty cell does nothing — it never moves and it never commu- 
nicates''. 

An update transition models the evolution of all non-faulty cells over one 
synchronous round. For readability, we describe the state-change caused by an 
update transition as a sequence of four functions (subroutines), where for each 
non-faulty i, 

(a) Route computes the variables disti and nexti. 



^ disti = oo can be interpreted as i's neighbors not receiving a timely response from i. 
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1 if ^failed^ then 



coloTi :— {c C : 3p ^ Entitiesi A color[p) — c] 
3 if i ^ IDt then 



for each c ^ C 



5 disii[c] :— ( min distj[c\ j + 1 




if disti [c] — oo then nexti [c] :— .L 
7 else nexti[c] :— avgmin {distj[c]. j) 



Figure 8: Route function for CelU. This function computes a minimum distance vector routing 
spanning tree rooted composed of non-faulty cells for each color, rooted at each target. 



(b) Lock computes the variables path^, pint^, Icsi, and locki, 

(c) Signal computes (primarily) the variable signal.^, and 

(d) Move computes the new positions of entities. 

We note that in the single-color case considered in [32], the Lock subroutine is 
unnecessary. 

The entire update transition is atomic, so there is no possibility to interleave 
fail transitions between the subroutines of update. Thus, the state of System at 
(the beginning of) round + 1 is obtained by applying these four functions to 
the state at round k. Now we proceed to describe the distributed traffic control 
algorithm that is implemented through these functions. 

Route. For each cell and each color, the Route function (Figure 8) constructs 
a distance-based routing table to the target cell of that color. This relies only 
on neighbors' estimates of distance to the target. Recall that failed cells have 
dist[c] set to oo for every color c G C. From a state x, for each i e NF{x), the 
variable disti[c] is updated as 1 plus the minimum value of distj[c] for each 
neighbor j of i. If this results in disti[c] being infinity, then nexti[c] is set to _L, 
but otherwise it is set to be the identifier with the minimum dist[c] where ties 
are broken with neighbor identifiers. 

Next, we introduce some definitions used to relate the system state to the 
variables used in the algorithm. For a state x, we inductively define the color 
c target distance pc of a cell i £ ID as the smallest number of non-faulty cells 
between i and tide- 



oo 



if Ti. failed. 



Pc{x,i) 



A 







ifi^ 



tide A -^x.failed, 



1+ min pc{Ti,j) otherwise. 

j ^x. Nbrs i 



A cell is said to be target-connected to color c if pc is finite. We define 



rC(x, c)^{ie NF{x) I pc(x, i) < oo} 
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as the set of cells that are target-connected to tide- 

For a state x and a color c e C, we define the routing graph as Gr{x, c) = 
( Vij(x, c), Efi{x, c)), where the vertices and directed edges are, respectively, 

Vr{x,c) ^ NF{x) and 

Er{x,c) = {{i,j) e Vb,{x,c) : Pc{y^,j) = Pc(x, i) + 1}. 

Under this definition, G'_r(x, c) is a spanning tree rooted at tide- We will show 
that the graph induced by the nexti[c] variables stabilizes to the routing graph 
Gr (x, c) at some state x (). We previously introduced A as the worst-case diam- 
eter of the communication graph, and will refer to A(x) as the exact diameter 
at some state x. 

Lock. The Lock function (Figure 9) executes after Route, and schedules traf- 
fic over intersections (the cells where source-to-target paths of different colors 
overlap). To avoid deadlock scenarios. Lock maintains an invariant that entities 
of at most one color are on these intersections. 

Moving entities over intersections requires some global coordination as il- 
lustrated by the following analogy. Consider the policy used to coordinate cars 
going in opposite directions over a one-lane bridge (see Figure 2), where there 
is a traffic signal on each side of the bridge. The algorithm chooses one traf- 
fic light, allowing some cars to safely travel over the bridge in one direction. 
After some time, the algorithm switches the lights (first turning green to red, 
and after the road is clear, turning red to green) allowing traffic to flow in the 
opposite direction. Then this process repeats. 

Two parts of the previous example require global coordination and are in- 
cluded in the Lock function. The first is how to chose the direction in which 
cars are allowed to travel — this is accomplished through the use of a mutual 
exclusion algorithm. The second is when to allow cars to travel in the opposite 
direction — this is accomplished by determining when the intersection is empty. 
We now describe this global coordination more formally. 

For defining the locking algorithm, we first define intersections. For this we 
introduce the notion of an entity graph. Cell i is said to be in the entity graph of 
some color c at state x if one of the following conditions hold: (a) i is side, (b) in 
state X, i has entities of color c, or (c) in state x, i is the neighbor closest to tide 
of a cell already in the entity graph. Formally, we define the color c entity graph 
at state x as Ge{x, c) = ( Ve{x, c), Ee{x, c)), which is the following subgraph 
of the color c routing graph G/{(x, c). The vertices of Ge{x, c) are inductively 
defined as 

T^b(x, c) = {i e NF{x) : i = side V x.color.i = c V {3j G V"£;(x, c).{i,j) £ ER{ii,c))}. 

The edges of (^^(x, c) are EEix,c) = {{i,j) G ^^(x, c) x F_b(x, c) : (i,j) G 
i?/? (x, c)}. For example, if all cells are empty, then Ve{x., c) is the sequence of 
cell identifiers defined by following the minimum distance (as defined by pc) 
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1 if —\failed^ 

for each c ^ C 

3 if 2 — side V color i — cV i G path^ [c] then 

path^[c] :— path^[c] U {i} U {nea:ti[c]} 
5 // gossip the entity graph 

for each J £ Nbrsi,path-[c] :— path ■ [c] \J path j[c] 
7 II compute the set of color-shared cells 

pmt^[c] :— {j G path^\c\ H path^\d\ : 3c ^ d ^ C} 
9 if pmt^ [c] ^ 
?csi[c] : — 

11 {d e C : c / A path^ [c] fl pat/i ■ [d] / 0} 

/ / graphs stabilized and i needs a lock for color c 
13 if round > 2A A z G pint^ [c] A —^locki [c] 

Initiate mutual exclusion algorithm between all 
15 color-shared cells in pmt^ [c] using Icsi as input 

Eventually, a color d is returned. 
17 On return, if — c then locki [c] :— true 

1 1 detect if color-shared cells are empty 
19 if round > 2A A -i € pintj^[c] A locki [c] 

Initiate distributed snapshot algorithm to decide 
21 if all color-shared cells are empty after previously 

being nonempty with entities of color c. 
23 On return, if all cells are empty then 

locki [c] :— false 



Figure 9: Lock function for Cellj. This function computes the color-shared cells — the cells in 
intersections — for each color, and then ensures liveness by giving a lock to only one color on 
each intersection. 



from the source to the target of color c. That is, each Ge{^^ c) is a simple path 
graph from source to target^. 

Now we describe how the entity graph of each color c is computed by each 
cell i as the path^ [c] variable. If i is on the entity graph of color c, then we add i 
and i's next variable for color c to the entity graph (see Figure 9, lines 3 and 4). 
Once the nexti [c] variables stabilize () and after an additional order of diameter 
rounds, the variable path^[c] contains all the entity graphs since we gossip these 
graphs (line 6). That is, the graph formed by the path^[c] variables stabilizes to 
equal G'£;(x,c), and contains the sequence of identifiers from any source or 
nonempty cell of color c to the target of color c (). 

Next, the variable pint^[c] is computed to be the set of cell identifiers on the 
color c entity graph that overlaps with any other colored entity graph (line 8). 
The cells involved in such non-empty intersections represent physical traffic 
intersections, and are called color-shared cells. These cells require coordinated 
locking for traffic flow to progress. Cell i is in pint^ [c] if and only if it will need 
a lock for color c. 

Formally, we define the c color-shared cells, for a state x, for any c G C, as 

CSC{x,c) - {y£;(x,c) :3deC.c^dA l/£;(x,c)n V£;(x,d) ^ 0}. 



*Once cells have failed, this may stabilize to be a tree from any cell with entities of color c to the 
target of color c. 
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if —'failed^ A round > 2A then 


if tokent — J_ then 




2 cn :— {d ^ C : Bj £ Nbrsi s.t nextj[d] — i 


token-i :— choose from NEPrevi 


12 


Acolorj — d} 






4 


let j — tokeni 


14 


if colori — J_ then c :— choose from cn 


if V p e Entitiesi : p ^ SR{i, j) 




6 else c :— colori 


A {colori ^ J- ^ colori — color j) 
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A (j ^ pmt^[c] ^ ^ocfcjfc]) 




8 NEPreVi :— 


then 


18 


{j G Nbrsi : nextj [c] — i A Entities j ^ 0} 


signal^ :— j 






if \NEPrevi \ > 1 then 


20 




tokeui :— choose from NEPrcvi \ {j} 






elseif 1 NEPrev then 


22 




tokeui :— choose from NEPrevi 






else tokeui :— _L 


24 




else signal^ :— _L; tokeiit :— j 





Figure 10: Signal function for Cell;. Cell i signals fairly to some neighbor j if it is safe for j to 
move its entities toward i. 



In Figure 1, these are cells 8 and 12. The pint^[c] variables stabilize to equal 
CSC{x, c), at some state x, for any color c (). 

Next, we need to determine the colors that will need to coordinate to sched- 
ule traffic through the color-shared cells. Then, a mutual exclusion algorithm is 
initiated between all cells for each disjoint set of cell colors in pint^[c] . Formally, 
we define the c shared colors, for a state x, for any c S C, as 

5C(x, c)^{deC -.c^dA CSC{x, d) = CSC{x, c)}. 

The Zcs,;[c] variables stabilize at some state x to equal SC{x, c), for any color c. 

In general, up to |C| colors could be involved in intersections, as well as all 
the smaller permutations. For instance, consider Figure 5 with 6 colors at some 
state X. Here, the blue and red entity graphs overlap, green and blue entity 
graphs overlap, but red and green do not, and independently, the purple and 
yellow entity graphs overlap (that is, not with blue, red, nor green), but no 
colors overlap with brown. Then SC{x, c) is {blue, red, green} for c equal to 
blue, red, or green, SC{x, c) is {yellow, purple} for c equal to yellow or purple, 
and SC{x, c) is empty for c equal to brown. Two mutual exclusion algorithms 
would be initiated, one with blue, red, and green as the input set of values, 
and another with yellow and purple as the input set. Upon these two instances 
terminating, one element of the first set, say green, would be chosen and given 
a lock, and one element, say yellow, of the second set would also be given a 
lock. The entities of these colors progress over the color-shared cells toward 
their intended targets. Finally, once the color-shared cells are empty again, 
green and yellow would each be removed from the respective input sets for 
fairness, and another mutual exclusion algorithm is initiated. 

Signal. The Signal function (Figure 10) executes after Lock. It is the key part of 
the protocol for maintaining safe entity separations, guaranteeing each cell has 
entities of only a single color, and ensuring progress of entities to the target. 
Roughly, each cell implements this through the following policies: (a) only 
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accept entities from a neighbor when it is safe to do so, (b) only accept entities 
with the same color as the entities currently on the cell (or an arbitrary color 
if the cell is empty), (c) if a lock is needed, then only let entities move if it is 
acquired, and (d) ensure fairness by providing opportunities infinitely often 
for each nonempty neighbor to make progress. 

First i computes a temporary variable cn, which is the set of colors for any 
neighbor that has entities of some color, with the corresponding next variable 
set to cell i. Next, cell i picks a color c from this set if it is empty, or the color of 
its own entities if it is nonempty, and will attempt to allow some cell with this 
chosen color to move toward itself. Then, cell i sets NEPreVi to be the subset 
of Nhrsi for which next has been set to i and Entities is nonempty. If tokeni 
is _L, then it is set to some arbitrary value in NEPrevi, but it continues to be 
_L if NEPreVi is empty. Otherwise, tokeni = j for some neighbor j of i with 
nonempty Entities j. This is accomplished through the conditional in line 6 as 
a step in guaranteeing fairness. 

It is then checked if there is any entity p with center p in the safety region 
of Celli on the side corresponding to tokeni. If there is such an entity, then 
signali is set to _L, which blocks the neighboring cell with identifier tokeni from 
moving its entities in the direction of i, thus preventing entity transfers and 
ensuring safety. Otherwise, if there is no entity with center in the safety region 
on side tokeni, then signali is set to tokeni to allow tokeni to move its entities 
toward i. Subsequently, tokeui is updated to a value in NEPrevi that is different 
from its previous value, if that is possible according to the rules just described 
(lines 20-22). 

Move. Finally, the Move function (Figure 11) models the physical movement of 
all the entities on cell i over a given round. For cell i, let j be nexti [c], where c is 
color i (which may be _L if cell i has no entities). Every entity in Entities i moves 
in the direction of j if and only if signal ^ is set to i. The direction followed 
from cell i to j is u{i,j), which is any vector satisfying Assumption 1. For 
example, for a square (or rectangular) cell i, one choice for u{i,j) is the unit 
vector orthogonal to Side{i, j) and pointing into j. In the case of an equilateral 
triangular cell i, one choice for u{i,j) is also any orthogonal vector pointing 
into j. 

The movement toward cell j may lead to some entities crossing the bound- 
ary of Cell; into Cellj, in which case, they are removed from Entitiesi. If j is not 
the target matching the transferred entities' color, then the removed entities 
are added to Entities j . In this case (line 9), any transferred entity p is placed 
so that Di{p) touches a single point of (is tangent to) Side{i, j), the shared side 
of cells i and j, and lies on the inner side of the transfer region of cell j on side 
Side{i,j). Resetting entity positions is a conservative approximation to the ac- 
tual physical movement of entities. If j is the target matching the transferred 
entities' color, then the removed entities are not added to any cell and thus no 
longer exist in System. 

The source cells i E IDs, in addition to the above, add a finite number of 
entities in each round to Entitiesi, such that the addition of these entities does 
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1 let c — color i 

let j — nexti [c] 
3 if ^failed^ A signal j — i then 

for each p G Entities i 
5 p :— p + vu{i, j) 
if p e Tfli then 
7 Entitiesi :— Entitiesi \ {p} 

if J ^ iitZc then 
9 Entities j :— Entities j U {p} 

/ / point on shared side along movement vector 
11 P '■— {j> + vu[i^ j)) n Side{i. j) 

I j inner transfer region along orthogonal line 
13 p:= {p + n(i,j)) n ITRj(Side(i,j)) 



Figure 11: Move function for Celli. If i has received a signal to move from j, it updates 
the positions of all entities on it to move in jf's direction, which may lead to some entities 
transferring from cell itoj. 



not violate the minimum gap between entities at Cell^. In the remainder of the 
paper, we will analyze System to show that in spite of failures, it maintains 
safety and liveness properties to be introduced in the next section. 

4. Analysis of Distributed Traffic Control 

In this section, we present an analysis of the safety and liveness properties 
of System. Roughly, the safety property requires that there is a minimum gap 
between entities on any cell, and the liveness property requires that all entities 
that reside on cells with feasible paths to the corresponding target eventually 
reach that target. 

4.1. Safety and Collision Avoidance 

A state is safe if, for every cell, the boundaries of all entities in the cell are 
separated by a distance of . For any state x of System, we define: 

S'a/ej(x) = Vp, q G s.. Entities i.p 7^ 9 Hp — 9|| > 2/ + Ts, and 
Safe{x) ^\fi ^ ID, Safef{'x.). 

This definition allows entities in different cells to be closer than 2/ + apart, 
but their centers will be spaced by at least 21. We proceed by proving some pre- 
liminary properties of System that will be used for proving Safe is an invariant. 

The first property asserts that entities' cannot come close enough to the 
sides of cells to reside on multiple cells. This is because any entity whose 
boundary touches the side of a cell is transferred to the neighboring cell on 
that side (if one exists), and then the entity's position is reset to be completely 
within the new cell. Assumption 2 restricts the allowed partitions to ensure en- 
tity transfers are well-defined. For instance, some of the cells in the snub square 
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tiling in Figure 3 do not satisfy Assumption 2. Consider an entity transfer from 
cell 3 to cell 5. There is no constant vector connecting the transfer regions of 
cell 3 to those of cell 5. This is because the side length of the transfer region of 
the triangular cell 5 is shorter than the side length of the transfer region of the 
square cell 3. However, in a transfer from cell 1 to cell 2 or vice-versa, the side 
lengths are the same. We also note that the assumption is only necessary for 
entity transfers from a cell with a longer transfer side length to a neighboring 
cell with smaller corresponding transfer side length. For example, a transfer 
from cell 5 to cell 3 is feasible. 

Under Assumption 2, we have the following invariant, which states that 
the /-ball around each entity in a cell is completely contained within the cell. 

Invariant 1. In any reachable state x, Vi e ID, Vp e x.Entitiesi, Di{p) \Pi^%. 

The next invariant states that cells' Entities sets are disjoint. This is imme- 
diate from the Move function since entities are only added to one cell's Entities 
upon being removed from a different cell's Entities. 

Invariant 2. In any reachable state x,for any i, j e ID, ifi^ j, then x.Entitiesi n 
X. Entities j — 0. 

The following invariant states that cells contain entities of a single color 
in spite of failures. This follows from the Signal routine in Figure 10, where 
line 16 requires that if some neighbor j is attempting to move entities toward 
cell i, then the color of i is either _L or equal to the color of j. 

Invariant 3. In any reachable state x, for all i e ID, for all p,q € x.Entitiesi, 
color{p) = color{q). 

Next, we define a predicate that states that if signal^ is set to the identifier of 
some neighbor j E Nhrsi, then there is a large enough area from the common 
side between i and j where no entities reside in Cell;. Recall that Side{i, j) is the 
line segment shared between neighboring cells i and j. For a state x, H(x) = 
Vi e ID, \/j e Nbrsi, if x.signal^ = j, then the following holds: 

Vp € x.Entitiesi, min ||p — a;|| > 3d. 

x(ESide{i.j) 

H{x) is not an invariant property because once entities move the property may 
be violated. However, for proving safety, all that needs to be established is that 
at the point of computation of the signal variable this property holds. The next key 
leiruna states this. 

Lemma 4. For all reachable states x, H{x) II{xs) where xs is the state obtained 
by applying the Route, Lock, and Signal functions to x. 

Proof. Fix a reachable state x, ani E ID, and an j E Nbrsi such that x.signal^ = 
j. Let xii be the state obtained by applying the Route function to x, x^ be 
the state obtained by applying the Lock function to xn, and X5 be the state 
obtained by applying the Signal function to x^. 
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First, observe that both H{xii) and H{xl) hold. This is because the Route 
and Lock functions do not change any of the variables involved in the defini- 
tion of H{-). Next, we show that H{xl) implies H{xs)- If xs-signal^ ^ j then 
the statement holds vacuously. Otherwise, xg.signal^ = j, then since (a) iJ(xx,) 
holds, and (b) Figure 10, line 6 is satisfied, we have that H{xs). □ 

The following lemma asserts that if there is a cycle of length two formed by 
the signal variables — ^which could occur due to failures — then entity transfers 
cannot occur between the involved cells in that round. 

Lemma 5. Let x be any reachable state and x' be a state that is reached from x af- 
ter a single update transition (round). If x. signal ^ = j and x.signalj — i, then 
x.EntitieSi = x! .Entities i and x. Entities j — x' .Entities j. 

Proof. No entities enter either x' .Entities i or x' .Entities j from any other m S 
Nhrsi or n £ Nbrsj since x.signalj — j and x.signalj — i. It remains to be es- 
tablished that $p e X. Entities j such that p' E x' .Entities i where p — p' or vice- 
versa. Suppose such a transfer occurs. For the transfer to have occurred, p must 
be such thatp' — {px,Py) + vu{i,j) by Figure 11, line 5. But for x.signalj ~ j to 
be satisfied, it must have been the case that Di{p) Pi ^ 9 by Figure 10, line 6 
and since v < I, a contradiction is reached. □ 

Using the previous results, we now prove that System preserves safety even 
when some cells fail. 

Theorem 1. In any reachable state x o/ System, Safe{x). 

Proof. The proof is by standard induction over the length of any execution of 
System. The base case is satisfied by the assumption that initial states x e Qo 
satisfy Safe{x). For the inductive step, consider any reachable states x, x' and 
an action a £ A such that x A x'. Fix i £ LD and assuming Safe^{x), we show 
that Safe^ix'). If a = fail^, then Safe^{x') since no entities move. 

For a — update, there are two cases to consider by Invariant 2. First, x' .Entitiesi C 
x.Entitiesi, that is, no new entities were added to i, but some may have trans- 
feredoff i. There are two sub-cases: if x' .Entities i — x.i?nfitiesi, then all entities 
in X. Entities move identically and the spacing between two distinct entities p, 
q e x! .Entities i is unchanged. Let j = nexti[c] where c = color i by Invariant 3. 
That is, yp,q E x.EntitieSi, yp',q' E x' .Entities i such that p' = p and q' = q 
and where p ^ q, \ \{p'^,Py) - {q'x,q'y)\ \ = \ \{Px,Py) + vu{i,j), {qx,qy) + vu{i,j)\\ 
(Figure 11, line 5). It follows by the inductive hypothesis that — > 

d. The second sub-case arises if x' .Entities i C x.EntitieSi, then Safe^ix') is ei- 
ther vacuously satisfied or it is satisfied by the same argument just stated. 

The second case is when x! .Entities i ^ x.Entitiesi, that is, there was at least 
one entity trans fered to z. Consider any such transferred entity p' E x' .Entities i 
where p' ^ x.Entitiesi. There are two sub-cases. The first sub-case is when 
p' was added to x' .Entities i because i is a source, that is, i E IDs. In this 
case, the specification of the source cells states that the entity p' was added to 
x' .Entities i without violating Safej^{x'), and the proof is complete. Otherwise, 
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•p' was added to -x! .Entities i by some neighbor j e x.Nbrsi, so p' G Entities j 
but p' ^ ^x.. Entities i, and p' G .Entities i but p' ^ x! .Entities j. From line 9 
of Figure 11, we have that that {p'^,p'y) — ResetEntity{p, The fact that p' 
transferred from CeWj in x to Cellj in x' implies that x.nextj ~ i and x.signal^ = 
j — these are necessary conditions for the transfer by Figure 10, line 15. Thus, 
applying the predicate H{x) at state x and by Lemma 4, it follows that for ev- 
ery q G yi. Entities i, [q^^qy) ^ FR{i, j). It must now be established that if p' is 
transfered to x' .Entities i, then every q' e x' .Entities i, where q' ^ p' satisfies 
ilx^l'y) ^ FR{hi)> which means that any entity q already on i did not move 
toward the transfered entity p that is now on i. This follows by application 
of Lemma 5, which states that if entities on adjacent cells move towards one 
another simultaneously, then a transfer of entities cannot occur This implies 
that the discs of all entities q' in x' .Entities i are farther than of the borders 
of any transfered entity p', implying Safe^{x'). Finally, since i was chosen arbi- 
trarily, Safe{x'). □ 

Theorem 1 shows that System is safe in spite of failures. 

4.2. Stabilization of Spanning Routing Trees 

Next, we show under some additional assumptions, that once new failures 
cease to occur. System recovers to a state where each non-faulty cell with a fea- 
sible path to its target computes a route toward it. This route stabilization is 
then used in showing that any entity on a non-faulty cell with a feasible path 
to its target makes progress toward it. Our analysis relies on the following as- 
sumptions on cell failures and the placement of new entities on source cells. 
The first assumption states that no target cells fail, and is reasonable and nec- 
essary because if any target cell did fail, entities of that color obviously cannot 
make progress. 

Assumption 3. No target cells t <= IDt may fail. 

The next assumption ensures source cells place entities fairly so that they 
may not perpetually prevent any neighboring cell or any color-shared cell from 
making progress. The assumption is needed because it provides a specification 
of how the source cells behave, which has not been done so far. The assumption 
is reasonable because it essentially says that traffic is not produced perpetually 
without any break. 

Assumption 4. (Fairness): Source cells place new entities without perpetually block- 
ing either (i) any of their nonempty non-faulty neighbors, or (ii) any cell i e CSC{x, c), 
where c is the color of source s. 

Formally, the first fairness condition states, for any execution a of System, 
for any color c G C, for any source cell sid^., if there exists an i G NbrSg, such 
that for every state x in a after a certain round, i G x.NEPreVg, then eventually 
signal g becomes equal to i in some round of a. The second fairness condition 
states, for any execution a of System, for any state x G a, for any color c G C, 
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for any source cell side, if there exists an i e NF{x) such that i G CSC{x, c), and 
for every state x in a after a certain round, if cell i is nonempty, then eventually 
signal j becomes equal to i in some round of a, where j is a neighbor of i. Such 
conditions can be ensured if we suppose some oracle placing entities on source 
cells follows the same round-robin like scheme defined in the Signal subrou- 
tine in Figure 10. Scenarios where each of these cases can arise are illustrated 
in Figure 6. 

A fault-free execution fragment a be a sequence of states starting from x 
and along which no fail(i) transitions occur. That is, a fault-free execution frag- 
ment is an execution fragment with no new failure actions, although there may 
be existing failures at the first state x of a, so F{x.) need not be empty. Through- 
out the remainder of this section, we will consider fault-free executions that 
satisfy Assumptions 3 and 4. 

Lemma 6. Consider any reachable state x of System, any color c <E C, and any 

i e TC{x, c) \ {tide}- Let h = pd^, i)- Any fault-free execution fragment a starting 
from X stabilizes within h rounds to a set of states S with all elements satisfying: 

disti[c] = h, and 

nexti[c] = in, where pd^, z„) = /i — 1. 

Proof. Fix an arbitrary state x, a fault-free execution fragment a starting from 
X, a color c e C, and i E TC{:x., c) \ {tide}- We have to show that (a) the set of 
states S is closed under update transitions and (b) after h rounds, the execution 
fragment a enters S. 

First, by induction on h we show that S is stable. Consider any state y E S 
and a state y' that is obtained by applying an update transition to y. We have 
to show that y' G S. For the base case, h — l,so y.disti[c] = 1 and y.nexti[c] = 
tide- From lines 5 and 7 of the Route function in Figure 8, and that there is a 
unique tide for each color c, it follows that y' .disti[c] remains 1 and y' .nexti[c] 
remains tide- For the inductive step, the inductive hypothesis is, for any given 
h, if for any j e NF{x.), y.distj[c] = h and y.nextj[c] ~ m, for some m E ID 
with Pc(x, m) = h — 1, then 

y'.distj[c\ = h and y' .next j[c] — m. 

Now consider i such that Pc{y, i) — Pc{y' , i) ^ h + 1. In order to show that S is 
closed, we have to assume that y.distjc] = h + 1 and y.nexti[c] = m, and show 
that the same holds for y'. Since Pc{y' , i) = h + 1, i does not have a neighbor 
with target distance smaller than h. The required result follows from applying 
the inductive hypothesis to m and from lines 5 and 7 of Figure 8. 

Second, we have to show that starting from x, a enters S within h rounds. 
Once again, this is established by induction on h, which is Pc(x, i). Consider 
any state y such that pd^, i) = Pc{y, The base case only includes the target 
distances satisfying h = Pc{y,i) = 1 follows by instantiating i„ = tide- 
For the inductive case, assume for the inductive hjrpothesis that at some state 
y, y.distj[c] = h and y.nextj[c] = i„ such that pc{y,in) = h — 1, where in is 
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the minimum identifier among all such cells (since we used cell identifiers to 
break ties). Observe that there is one such j e y.Nbrsi by the definition of TC. 
Then at state y', by the inductive hypothesis and lines 5 and 7 of Figure 8, 

y' .disti[c] — y' .distj[c] + 1 = /i + 1. □ 

The following corollary of Lemma 6 states that, after new failures cease 
occurring, for all target-connected cells, the graph induced by the next[c] vari- 
ables stabilizes to the color c routing graph, Gu{x,c), within at most the di- 
ameter of the communication graph number of rounds, which is bounded by 
A(x). 

Corollary 7. Consider any execution a of System with an arbitrary hut finite se- 
quence o/fail transitions. For any state x e a af least 2A(x) rounds after the last fail 
transition, for any c £ C, every cell i target-connected to color c has x.nexti[c] equal 
to the identifier of the next cell along such a route. 

The following corollary of Lemma 6 and states that within 2A(x) rounds 
after routes stabilize, for each color c G C, the identifiers in the path^[c] vari- 
ables equal the vertices of the color c entity graph Ge{^, c). The result follows 
since routes stabilize and that Lock is a function of next and path variables only, 
and that path^ variables are gossiped in Figure 9, line 6. 

Corollary 8. Consider any execution a of System with an arbitrary but finite se- 
quence o/fail transitions. For any state x £ a at least 2A(x) rounds after the last 
fail transition, for every c £ C, every cell i target-connected to color c has path^[c] = 

The next corollary of Lemma 6 states that eventually the values of the pint[c] 
variables equal the set of color-shared cells CSC (x, c) for any cell i and color c. 
This is important because the mutual exclusion algorithm is initiated between 
the cells in pint[c] (Figure 9, line 13). 

Corollary 9. Consider any execution a of System with an arbitrary but finite se- 
quence o/fail transitions. For any state x £ aat least 2A(x) rounds after the last fail 
transition, for every c £ C, every cell i target-connected to color c has x.pint[c] = 

CSC{-K,c). 

4.3. Scheduling Entities through Color-Shared Cells 

In this section, we show that there is at most a single color on the set of 
color-shared cells if there are no failures. We then show that any cell that re- 
quests a lock eventually gets one, under an additional assumption that failures 
do not cause entities of more than one color to reside on the set of color-shared 
cells. Because failures cause the routing graphs and entity graphs to change, 
the color-shared cells that could previously be scheduled may now be dead- 
locked. Additionally, because we separately lock each disjoint set of color- 
shared cells to allow entities of some color to flow toward their target, it could 
be the case that the intermediate states between when the failure occurred and 
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when routes have stabilized allowed entities to move in such a way that dead- 
locks the system. Such deadlocks could be avoided if a centralized coordinator 
informs every non-faulty cell to disable their signals when a failure is detected. 
The assumption states that with failures, the color-shared cells either all have 
the same-colored entities, or have no entities (and combinations thereof). 

Assumption 5. Feasibility of Locking after Failures: For any reachable state x, 
for any color c ^ C, consider the color-shared cells CSC{x, c). For all distinct cells 
i,j € CSC (pi, c) either X. color i — x. color j or x.colori — _L. 

The next lemma states that without failures, there are entities of at most a 
single color on the set of color-shared cells. The result is not an invariant be- 
cause failures may cause the set of color-shared cells to change, resulting in 
deadlocks, which is why we need Assumption 5. By Invariant 3, we know that 
there are entities of at most a single color in each cell, so the following invariant 
is stated in terms of the color color i of each cell. We emphasize that Assump- 
tion 5 is unnecessary if there are no failures, as the algorithm ensures there 
are entities of at most a single color on the color-shared cells by the following 
lemma. 

Lemma 10. If there are no failures, for any reachable state x,for any c G C,for any 
i e CSC(x,c), if ^x.lockilc], thenforall j e CSC{x,c), zve have x. color j ^ c. 

Proof. The proof is showing an inductive invariant, supposing no failures oc- 
cur. For the initial state, all cells are empty, so we have x. color i = _L for any 
i G ID. For the inductive step, we are only considering update actions by as- 
sumption. In the pre-state, we have -^x.locki[c] and Vj G CSC{x, c), we have 
X. color j ^ c. Fix some c G C and some i G CSC (x, c). For any subsequent state 
x', if x'.locki[c], the result follows vacuously. If ^x' .locki[c], we must show 
Vj G CSCix, c) that x.colorj ^ c, so fix some j G CSC{x, c). If j G CSC{xl , c), 
the result follows, since by the inductive hjrpothesis, x. color j — x' .color j ^ c. 
If j ^ CSC (x', c), the condition in Signal (Figure 10, line 17) cannot be satisfied 
since -^x' .locki\c]. Thus, no cell with entities of color c could move toward any 
cell in CSC{x' , c), and we have x' .color j ^ c. □ 

The next lemma states that without failures, or with "nice" failures as de- 
scribed by Assumption 5, that any cell requesting a lock of some color will 
eventually get it, and thus it may move entities onto the color-shared cells. 

Lemma 11. For any reachable state x satisfying Assumption 5, for any c £ C, for 

any i G NF{x), ifi G x.pint[c] and all cells in CSC{x, c) are empty, then eventually 
a state x' is reached where x'.locki[c]. 

Proof. By correctness of the mutual exclusion algorithm, eventually a color d G 
SC{x',c) is returned and x' .locki[d] = true (Figure 9, line 13). If c — d, then 
the result follows. If c 7^ rf, by Lemma 10 and Assumption 5, we know that no 
other color aside from c has entities on any cell j G CSC{x' , c). The next time 
the mutual exclusion algorithm is initiated, d is excluded from the input set to 
the mutual exclusion algorithm (Figure 9, line 19), and by repeated argument, 
eventually locki[c]. □ 
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4.4. Progress of Entities towards their Targets 

Using the results from the previous sections, we show that once new fail- 
ures cease occurring, for every color c £ C, every entity of color c on a cell 
that is target-connected eventually gets to the target of color c. The result (The- 
orem 2) uses two lemmas which establish that, along every infinite execution 
with a finite number of failures, every nonempty target-connected cell gets per- 
mission to move infinitely often (Lemma 13), and a permission to move allows 
the entities on a cell to make progress towards the target (Lemma 12). 

For the remainder of this section, we fix an arbitrary infinite execution a of 
System with a finite number of failures, satisfying Assumption 5. Let x/ be any 
state of System at least 2A(x) rounds after the last failure, and a' be the infinite 
failure-free execution fragment x/, xy+i, ... of a starting from x/. For any c € 
C, observe that the number of target-connected cells remains constant starting 
from X/ for the remainder of the execution. That is, TC{xf,c) = TC(x/+i,c) = 
TC(...,c),sowefix TC{c) = TC(x/,c). 

Lemma 12. For any c e C, for any i e TC{c),for some j e xj.Nbrsi, if k > f, 
Xk-signalj = i, and x.k.nexti[c] = j,for any entity p e Xk-Entitiesi, let the distance 
function be defined by the lexicographically ordered tuple 

R{^,P) = {Pci^,i),ds ~p) , 

where ds is the point on the shared side Side{i, j) defined by the line passing through 
pwith direction u{i,j). Then, R{xk+i,p) < R{xk,p). 

Proof. The first case is when no entity transfers from i to j in the k + 1*^ round: 
if p' G Xk+i. Entities i such thatp' = p, then \\ds ~p'\\ < \\ds —p\\. In this case, 
the result follows since a velocity w > is applied towards cell j by Move 
in Figure 11, line 5. The second case is when some entity p transfers from i to 
j, so p' e Xk+i ■ Entities j such that p' — p. In this case, we have pc{xk,j) < 
Pc(xfe,i), since the distance between j and tide is smaller than the distance 
between i and tide since routes have stabilized by Lemma 6. In either case, 
i?(xfe+i , p) < R{xk , p), so entity p is closer to the appropriate target. □ 

The following lemma states that all cells with a path to the target receive a 
signal to move infinitely often, so Lemma 12 applies infinitely often. 

Lemma 13. For any c<E C, consider any i e TC{c) \ tide, such that for all k > f,if 
Xk-Entitiesi ^ 0, then 3k' > k such that Xk' .signal ^exti[c] — *• 

Proof. Fix some c e C. Since i G TC{c), there exists h < oo such that for 
all k > f, pc{xk,i) — h. We prove the lemma by inducting on h. The base 
case is h = 1. Fix i and instantiate k' — f + ns{tidc). By Lemma 6, for any 
t e IDx, for all non-faulty i e Nbrst, Xf.nexti[c] = t since k > f. For all 
/c > /, if Xk- Entities i ^ 0, then signal ^^^^^ changes to a different neighbor with 
entities every round. It is thus the case that \xk.NEPrevtidJ ^ ns{tidc) and 
since Entities ud^ = always, exactly one neighbor satisfies the conditional 
of Figure 10, line 6 in any round, then within ns{tidc) rounds, signal — i. 
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For the inductive case, let fcs = k + h be the step in a after which all non- 
faulty a S Nhrsi have -x.k^.nexta[c] = ihy Lemma 6. Also by Lemma 6, 3m S 
Nhrsi such that Xk^-distm < ^k^-disti, implying that after ks, \xk^-NEPreVi\ < 
ns{i) since x^^.nexti = m and Xk^-next,n 7^ i- By the inductive hypothesis, 
Xfc^ .si(7na/„g^(. j^] — i infinitely often. If i e IDs, then entity initialization 
does not prevent Xk- signal = a from being satisfied infinitely often by the 
second assumption introduced in Subsection 4.2. It remains to be established 
that signal ~ a infinitely often. Let a e x^^ .NEPreVi where pd^k^ ,a) = h + l. 

In any of the following cases, if i e Xfc^ .pint[c] and all cells j G CSC (x^^ , c) 
are empty, then by Lemma 11, eventually locki[c]. If \xk,-NEPreVi\ = 1, then 
since the inductive hypothesis satisfies signal ^^^f.. [c] — i- infinitely often, then Lemma 12 
applies infinitely often, and thus Entitiesi = infinitely often, finally implying 
that signal^ — a infinitely often. 

If \xk,.NEPreVi\ > 1, there are two sub-cases. The first sub-case is when 
no entity enters i from some d a G x^^-NEPrevi, which follows by the same 
reasoning used in the \xk^-NEPrevi\ = 1 case. The second sub-case is when 
a entity enters i from d, in which case it must be established that signal^ = 
a infinitely often. This follows since if x^/ . tokeni ~ a where k' > kt > k^ 
and kt is the round at which an entity entered i from d, and the appropriate 
case of Lemma 4 is not satisfied, then x^'+i- signal.^ — _L and x^i^i. tokeni = a 
by Figure 10, line 25. This implies that no more entities enter i from either cell d 
satisfying d ^ a. Thus tokeni = a infinitely often follows by the same reasoning 
\xk^.NEPreVi\ = 1 case. □ 

The final theorem establishes that entities on any cell in TC(c) eventually 
reach the target in a'. 

Theorem 2. For any c £ C, consider any i e TC{c), V/c > f,yp £ Xk-Entitiesi, 
3k' > k such that p G Xk/ .Entities next i[c]- 

Proof. Fix c e C, i G TC{c), a round k > f and p G x^. Entities i. Let h = 
maXjgTC(c) Pc(x/, «) which is finite. By Lemma 6, at every round after /c^ = k+h 
for any? G TC(c), the sequence of identifiers /3 = i,Xk^.nexti[c],Xk^.next„^^t.[c][c], 
. . . forms a fixed path to tide. Applying Lemma 13 to i G TC{c) shows that 
there exists fc„i > kg such that x^^. signal textile] = Now applying Lemma 12 
to Xfc^ establishes movement of p towards x^^ . nexti [ c] , which is also x^,^ . nexti [ c] . 
Lemma 13 further establishes that this occurs infinitely often, thus there is a 
roimd k' > fc„j such that p gets transferred to Xfc^^^.i?rjii<ies„e^t.[c]. □ 

By an induction on the sequence of identifiers in the path (3, it follows that 
entities on any cell in TC (c) eventually get consumed by the target. 

Summary of Results 

In this section, we establish several invariant properties culminating in prov- 
ing safety of the system, which meant that entities never collide, in spite of 
failures. Next, we proved that the routing algorithm used to construct paths to 
the destinations is self-stabilizing in spite of arbitrary crash failures. We next 
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showed under an assumption that failures do not introduce deadlock scenarios 
that the locking algorithm allows multi-color flows to mutual exclusively take 
control of intersections (color-shared cells). Finally, under a fairness assump- 
tion, we established the main progress property through two results, that any 
cell gets permission to move infinitely often, and that any cell with a permis- 
sion to move decreases the distance of any entities on it from its destination. 

5. Simulation Experiments 

We have performed several simulation studies of the algorithm for evaluat- 
ing its throughput performance. In this section, we discuss the main findings 
with illustrative examples taken from the simulation results. We implemented 
the simulator in Matlab, and all the partition figures displayed in the paper are 
created using it. 

Let the K-round throughput of System be the total number of entities arriving 
at the target over K rounds, divided by K. We define the average throughput 
(henceforth throughput) as the limit of X-round throughput for large K. All 
simulations start at a state where all cells are empty and subsequently entities 
are added to the source cells. 

Single-color throughput without failures as a function of rs, I, v. Rough calcula- 
tions show that throughput should be proportional to cell velocity v, and in- 
versely proportional to safety distance and entity radius I. Figure 12 shows 
throughput versus for several choices of v for an 8 x 8 unit square tessel- 
lation instance of System with a single entity color. The parameters are set to 
/ = 0.25 and K — 2500. The entities move along a line path where the source 
is the bottom left comer cell and the target is the top left comer cell. For the 
most part, the inverse relationship with v holds as expected: all other factors 
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remaining the same, a lower velocity makes each entity take longer to move 
away from the boundary, which causes the predecessor cell to be blocked more 
frequently, and thus fewer entities reach tid from any element of IDs in the 
same number of rounds. In cases with low velocity (for example v = 0.1) and 
for very small r^, however, the throughput can actually be greater than that at 
a slightly higher velocity. We conjecture that this somewhat surprising effect 
appears because at very small safety spacing, the potential for safety violation 
is higher with faster speeds, and therefore there are many more blocked cells 
per round. We also observe that the throughput saturates at a certain value of 
(~ 0.55). This situation arises when there is roughly only one entity in each 
cell. 

Single-color throughput without failures as a function of the path. For a sufficiently 
large number of rounds K, throughput is independent of the length of the 
path. This of course varies based on the particular path and instance of System 
considered, but all other variables fixed, this relationship is observed. More in- 
teresting however, is the relationship between throughput and path complex- 
ity, measured in the number of turns along a path. Figure 13 shows through- 
put versus the number of turns along paths of length 8. This illustrates that 
throughput decreases as the number of turns increases, up to a point at which 
the decrease in throughput saturates. This saturation is due to signaling and 
indicates that there is only one entity per cell. 

Single-color throughput under failure and recovery of cells. Finally, we considered 
a random failure and recovery model in which at each round each non-faulty 
cell fails with some probability p / and each faulty cell recovers with some prob- 
ability pr [33]. A recovery sets failed.^ = false and in the case of tid also resets 
disttid — 0, so that eventually Route will correct nextj and distj for any j e TC . 
Intuitively, we expect that throughput will decrease as increases and in- 
crease as Pr increases. Figure 14 demonstrates this result for 0.01 < Pf < 0.05 
and 0.05 < Pr < 0.2. There is a diminishing return on increasing pr for a fixed 
Pf, in that for a fixed pf increasing pr results in smaller throughput gains. 

Multi-color throughput as a function of the number of intersecting cells. Now we 
discuss the influence of multi-color throughput. In the case where the paths 
between different sources and targets do not overlap, all the results from the 
single-color simulation results apply. In the case where the paths do overlap, 
the mutual exclusion algorithm runs to ensure no deadlocks occur. This addi- 
tional control logic will have an influence on the throughput. For the multi- 
color cases, we consider the summed throughput, which is the sum of the 
throughputs for each color. 

Figure 16 shows the roughly exponential decrease in throughput as the frac- 
tion of overlapping paths increases for two colors with path length 8 and no 
turns. The fraction of overlapping paths is defined as the number of vertices in 
the color-shared cells C5C(x, c). As the fraction increases, the paths lie com- 
pletely on top of one another, so in this case with path length 8, we have no 
overlap, 1 cell overlap, etc. 
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Figure 14: Throughput versus failure rate p j 
for several recovery rates pr with an initial 
path of length 8, where K = 20000, rs =0.05, 
I = 0.2, and v = 0.2 for System with an 8 x 8 
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Figure 16: Throughput versus fraction of path 
overlap for two colors on a 1 x 16 unit square 
tessellation. 
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Figure 15: Throughput versus increasing path 
length of square (blue) and equilateral triangu- 
lar (red) partitions. 
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Figure 17: Throughput versus number of over- 
lapping colors on a 1 X 3 unit square tessella- 
tion. 



Multi-color throughput as a function of the number of intersecting colors. Intersec- 
tions (that is, having at least one color-shared cell) have a fixed cost on through- 
put. Specifically, the summed throughput of there being two overlapping col- 
ors on a cell is the same as the summed throughput of three or more. Figure 17 
shows this fixed decrease in throughput as the number of overlapping col- 
ors increases for a fixed path of length 3 with 3 color-shared cells, where the 
decrease in throughput from having no overlaps to having one color overlap- 
ping is about 4.5 times. Once there are two colors, all additional colors do not 
decrease throughput. This observation agrees with intuition — the decrease in 
throughput due to an intersection is independent of the number of destinations 
for the entities that must pass through that intersection. 
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6. Related Work 



There is a large amount of work on traffic control in transportation systems 
(see, e.g., [4, 34]) and robotics (see, e.g., [35]). We briefly summarize some of the 
more related work, but highlight that we are presenting a formal model of an 
example of such systems. Distributed air and automotive traffic control have 
been studied in many contexts. Human-factors issues are considered in [36, 37] 
to ensure collision avoidance between the coordination of numerous pilots and 
a supervisory controller modeling the semi-centralized air traffic control com- 
ponents. The Small Aircraft Transportation Protocol (SATS) is semi-distributed 
air traffic control protocol designed for small airports without radar, so pilots 
and their aircraft coordinate among themselves to land after being assigned a 
landing sequence order by an automated system at the airport [16]. SATS has 
been formally modeled and analyzed using a combination of model checking 
and automated theorem proving [38]. SATS and this paper share an abstrac- 
tion: the physical environment is a priori partitioned into a set of regions of 
interest, and properties about the whole system are proved using composi- 
tional analysis. Safe conflict resolution maneuvers for distributed air traffic 
control are designed in [39]. A formal model of the traffic collision avoidance 
system (TC AS) is developed and analyzed for safety in [40] . TC AS is a system 
deployed on aircraft that alerts pilots when other aircraft are in close proximity 
and guides them along safe trajectories. 

A distributed algorithm (executed by entities, vehicles in this case) for con- 
trolling automotive intersections without any stop signs is presented in [18]. 
Some methods for ensuring liveness for automotive intersections are presented 
in [41]. A method to detect the mode of a hybrid system control model of an 
autonomous vehicle in intersections is developed in [42], and is used to reduce 
conservatism of the maximally controlled invariant set (the set of collision- 
free controls). Efficient distributed intersection control algorithms are devel- 
oped in [43]. There is a large amount of work on flocking [44] and platoon- 
ing [45, 46, 47, 48]. Only a few works consider failures in such systems, like the 
arbitrary failures considered in [49, 50], the actuator failures considered in [48], 
or in S5mchronization of swarm robot systems in [51]. 

Distributed robot coordination on discrete abstractions like [52, 23, 53, 54, 
55, 56, 57] can be viewed as traffic control. For instance, [23] establishes a for- 
mal connection between the continuous and the discrete parts of these proto- 
cols, and also presents a self -stabilizing algorithm with similar analysis to the 
analysis in this paper. These works also decompose the continuous problem 
into a discrete abstraction by partitioning the environment, but all these works 
allow at most a single entity (robot) in each partition, while our framework 
allows numerous entities in each partition. If several entities are to visit some 
destination in [53, 56, 57], like our targets here, that destination is represented 
as the union of a set of partitions and each entity must reside in one of these 
partitions. 

The Kiva Systems robotic warehouse [52] is a robotic traffic control system 
on square partitions, and can be described in our framework by allowing a 
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single entity per cell. In these warehouse systems, there is a central coordinator 
scheduling tasks, but the robots are responsible for path planning using an A*- 
like search algorithm [52]. However, several deadlock scenarios are identified 
when performing such path planning [54]. The Adaptive Highways Algorithm 
presented in [54] for scheduling entities relies on using the tentative trajectories 
of other robots collected by the central controller Deadlocks are also observed 
in other distributed robotics path-planning algorithms on discrete partitions 
in [58]. Deadlock scenarios can also arise without a discrete abstraction, such 
as in the doorways considered in [59], the path formation algorithms of [60], or 
the warehouse automation system of [61]. 

Lastly, we mention that most of these works on traffic control from aviation, 
automotive, swarm robotics, and warehouse automation applications can be 
modeled within the framework of spatial computing [62, 63, 64]. 

7. Discussion 

In this section, we discuss some ways to generalize assumptions used in 
the paper and some alternative methods. In this paper, we presented a dis- 
tributed traffic control algorithm for the partitioned plane, which moves en- 
tities without collision to their destinations, in spite of failures. While our 
algorithm is presented for two-dimensional partitions, an extension to some 
three-dimensional partitions (e.g., cubes and tetrahedra) follows in an obvious 
way. An extension to the more general case where there are multiple sources 
and multiple targets of each color — and entities of each color move toward the 
nearest target of that color — is straightforward, but complicates notation. 

Self-Stabilizing Mutual Exclusion and Distributed Snapshot Algorithms. There are 
a variety of mutual exclusion algorithms that could be used to determine locks 
(Figure 9, line 13). For this paper, we require the overall system to be stabi- 
lizing and therefore the locking algorithm itself should be stabilizing. To this 
end, any of the following algorithms could be adapted to our framework: the 
token circulation algorithm [65], mutual exclusion [66], group mutual exclu- 
sion [67], snap-stabilizing propagation of information with feedback (PIF) al- 
gorithm [68], or fc-out-of-^ mutual exclusion [69]. A self -stabilizing distributed 
snapshot algorithm (see [27, Ch. 5]) can be used to determine if all c color- 
shared cells are empty, after having had some entity of color c (Figure 9, line 19). 
If all cells are empty, then another round of mutual exclusion commences, ex- 
cluding color c from the input set. 

General Triangulations and Affine Dynamics. We assumed in Section 2 that the 
partitions satisfy several geometric assumptions for feasibility of entity trans- 
fers. We considered using vector fields generated by a discrete abstraction like 
those presented in [70, 71, 72, 73]. The affine vector fields generated on sim- 
plices in [70, 73] can be used to move an entity (with potentially nonholo- 
nomic or nonlinear dynamics) through any side of a cell in a triangulation 
(simplex) [70, 72] or rectangle [71]. However, it turns out that it is impossible to 
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maintain our notion of safety for such vector fields without additional collision 
avoidance mechanisms implemented on each entity. This is due to a simple ge- 
ometric observation — moving entities through a shorter side than the side they 
entered through may require the entities to come closer together. For example, 
if a cell in the triangulation has an obtuse angle, then the vector field gener- 
ated by [70] flowing from the longest edge to the shortest edge has negative 
divergence. Furthermore, a vector field having negative divergence implies 
the flow corresponding to any two distinct points starting in that field come 
closer together, hence safety cannot be maintained. The distributed problems 
using these discrete abstractions [53, 56, 57] avoid this by requiring at most one 
entity in any (triangular) partition at a time. 

We also mention a simple condition to ensure that triangulations have the 
required geometric partition properties (Assumptions 1 and 2). If all the trian- 
gles in the triangulation are non-obtuse, then the triangulation satisfies these 
assumptions. We also note that restricting allowable triangulations of an en- 
vironment to ones without obtuse angles is not restrictive, since any polygon 
can be efficiently partitioned into a triangulation with non-obtuse [74, 75] or 
acute [76] angles. 

Insufficiency of Disjoint Paths. Finding disjoint paths, such as by using the al- 
gorithms from [77, 78, 79, 80], could be another approach to solving the multi- 
color problem, but the locking mechanism used here solves a more general 
problem. Even without failures, there are many environments and choices of 
sources and targets for which there are no disjoint paths between sources and 
targets. One such environment is shown in Figure 4, where for two distinct 
colors c and d, the paths between the respective sources and targets necessar- 
ily overlap, so an algorithm for finding disjoint paths cannot be used as there 
are no disjoint paths between sources and targets. However, there are disjoint 
paths in some cases, so no scheduling would be necessary if these are found, 
but our routing algorithm does not necessarily find these, as the disjoint paths 
may not be shortest distance. A self-stabilizing algorithm for finding disjoint 
paths on planar graphs would be an enhancement to our algorithm, as it would 
increase throughput in the case that paths need not overlap. 

Back-Pressure and Wormhole Routing. Back-pressure routing [81, 82] is an al- 
gorithm for dynamically routing traffic over an underlying graph using con- 
gestion gradients. If we view the color of each entity as its intended address 
and consider this problem from the perspective of queuing theory, one might 
think back-pressure routing could provide a throughput-optimal solution for 
the problem. However, our physical motion model is incompatible with back- 
pressure routing. For a given cell, our model does not allow arbitrary choice 
of the next neighbor for each entity on that cell. In particular, when one cell 
moves its entities toward a neighboring cell, all entities sufficiently near the 
shared side between the two neighbors would transfer. 

Wormhole routing [83] is a flow control policy over a fixed underlying 
graph for determining when packets move to the node on the graph. Ad- 
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Figure 18: Hexagonal partition that does not satisfy 
tlie projection property (Assumption 1). An exten- 
sion to allow such partitions would require enlarg- 
ing the transfer region and receiving a signal from 
all of the potential next neighbors, which would re- 
quire cells 3 and 7 both to signal cell 4 to move. 
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Figure 19: Example system on a parallel- 
ogram partition with failed cells in black. 
The several turns along the path from 
the source to the target cause a satura- 
tion of entities on cells 6, 10, and 14. 
The movement vector j) is defined as 
the unit vector parallel to the x axis for 
movement between horizontal neighbors, 
and the unit vector parallel to the vertical 
sides of the parallelograms between ver- 
tical neighbors. 



dresses in wormhole routing are very short and come at the beginning of a 
packet, so a packet can be subdivided into pieces or flits and begin being for- 
warded after the address is received, yielding a snake-like sequence of flits in 
transfer One could also view the sequence of entities on a path toward the 
appropriately-colored target (see Figure 19) sequence of flits flowing to a des- 
tination in wormhole routing. While similar deadlock scenarios can arise in 
our system and wormhole routing, wormhole routing is incompatible with our 
system due to the motion model just like back-pressure routing. 

8. Conclusion 

We presented a self-stabilizing distributed traffic control protocol for the 
partitioned plane, where each partition controls the motion of all entities within 
that partition. The algorithm guarantees separation between entities in the 
face of crash failures of the software controlling a partition. Once new failures 
cease occurring, it guarantees progress of all entities that are neither isolated 
by (a) failed partitions, nor (b) cells with entities of other colors that become 
deadlocked due to failures, to the respective targets. Through simulations, we 
presented estimates of throughput as a function of velocity, minimum sepa- 
ration, single-target path complexity, failure-recovery rates, and multi-target 
path complexity. 

It would be interesting to develop strategies allowing entities of different 
colors on a single cell. Our strategy of preventing entities of different colors 
from residing on a single cell simplified some analysis, but it also complicated 
some analysis, by making it harder to prove progress because deadlock sce- 
narios may frequently arise. It would be interesting to develop algorithms al- 
lowing mixing and sorting of colors using different types of motion coupling. 
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It would also be interesting to design algorithms that can allow relaxing the 
assumption on what failures may occur to ensure liveness. We believe this 
would require a more complex routing algorithm to temporarily move entities 
of some colors off the color shared cells, thus allowing some other color on the 
color shared cells to make progress. 
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